Appearance-and-Relation Networks for Video Classification

نویسندگان

Limin Wang

Wei Li

Wen Li

Luc Van Gool

چکیده

Spatiotemporal feature learning in videos is a fundamental and difficult problem in computer vision. This paper presents a new architecture, termed as Appearanceand-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation from RGB input in a separate and explicit manner. Specifically, SMART blocks decouple the spatiotemporal learning module into an appearance branch for spatial modeling and a relation branch for temporal modeling. The appearance branch is implemented based on the linear combination of pixels or filter responses in each frame, while the relation branch is designed based on the multiplicative interactions between pixels or filter responses across multiple frames. We perform experiments on three action recognition benchmarks: Kinetics, UCF101, and HMDB51, demonstrating that SMART blocks obtain an evident improvement over 3D convolutions for spatiotemporal feature learning. Under the same training setting, ARTNets achieve superior performance on these three datasets to the existing state-of-the-art methods. 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

The Feedback Based Mechanism for Video Streaming Over Multipath Ad Hoc Networks

Ad hoc networks are multi-hop wireless networks without a pre-installed infrastructure. Such networks are widely used in military applications and in emergency situations as they permit the establishment of a communication network at very short notice with a very low cost. Video is very sensitive for packet loss and wireless ad-hoc networks are error prone due to node mobility and weak links. H...

متن کامل

A Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory

The popularity of video-on-demand (VoD) streaming has grown dramatically over the World Wide Web. Most users in VoD P2P networks have to wait a long time in order to access their requesting videos. Therefore, reducing waiting time to access videos is the main challenge for VoD P2P networks. In this paper, we propose a novel algorithm for caching video based on peers' priority and video's popula...

متن کامل

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

A Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1711.09125 شماره

صفحات -

تاریخ انتشار 2017

Appearance-and-Relation Networks for Video Classification

نویسندگان

چکیده

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

The Feedback Based Mechanism for Video Streaming Over Multipath Ad Hoc Networks

A Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory

A Comparative Study of Gender and Age Classification in Speech Signals

A Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory

عنوان ژورنال:

اشتراک گذاری